Introduction


So you want to know the range of products being sold by your competitor. You go to their website and see all the products (along with the details) and want to compare it with your own range of products. Great! How do you do that? How do you get the details available on the website into a format in which you can analyse it?

Hmmm.. If you have these or similar questions on your mind, you have come to the right place. In this post, we will learn about web scraping using R. If you like a more structured approach, try our free online course, Web Scraping with R.

The What?


So, what exactly is web scraping or web mining or web harvesting? It is a technique for extracting data from websites. Remember, websites contain wealth of useful data but designed for human consumption and not data analysis. The goal of web scraping is to take advantage of the pattern or structure of web pages to extract and store data in a format suitable for data analysis.

The Why?


The How?


Use Cases


Things to keep in mind…


Case Studies


HTML Basics


To be able to scrape data from websites, we need to understand how the web pages are structured. In this section, we will learn enough about HTML to be able to start scraping data from websites.

HTML, CSS & JAVASCRIPT


HTML Tags


DOM


HTML Attributes


Class, Div & Style


Libraries


library(robotstxt)
library(rvest)
library(selectr)
library(xml2)
library(dplyr)
library(stringr)
library(forcats)
library(magrittr)
library(tidyr)
library(ggplot2)
library(lubridate)
library(tibble)
library(purrr)

Best Selling Mobile Phones


Brand Name


Color


Rating


Number of Reviews


Real Price


Actual Price


IMDB Top 50


Title


Year of Release


Certificate


Runtime


Genre


Rating


XPATH


Votes


Revenue


Top Websites


RBI Governors


Summary